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^ Abstract 

o 

^ In general, the distribution of residuals cannot be obtained explicitly. We give 

an asymptotic formula for the density of Pearson residuals in continuous generalized 
linear models corrected to order n^^, where n is the sample size. We define cor- 

^ rected Pearson residuals for these models that, to this order of approximation, have 

^ exactly the same distribution of the true Pearson residuals. Applications for im- 

portant generalized linear models are provided and simulation results for a gamma 
z/^i model illustrate the usefulness of the corrected Pearson residuals. 

Keywords: Exponential family; Generalized linear model; Pearson residual; Preci- 

^ sion parameter 

> 

00 1 Introduction 

0^ The residuals carry important information concerning the appropriateness of assump- 

^ tions that underlie statistical models, and thereby play an important role in checking 

O model adequacy. They are used to identify discrepancies between models and data, so 

> it is natural to base residuals on the contributions made by individual observations to 

measures of model fit. The use of residuals for assessing the adequacy of fitted regres- 
sion models is nowadays commonplace due to the widespread availability of statistical 
^ software, many of which are capable of displaying residuals and diagnostic plots, at least 

for the more commonly used models. Beyond special models, relatively little is known 
about asymptotic properties of residuals in general regression models. There is a clear 
need to study second-order asymptotic properties of appropriate residuals to be used for 
diagnostic purposes in nonlinear regression models. 

The unified theory of generalized linear models (GLMs), including a general algorithm 
for computing the maximum likelihood estimates (MLEs) is extremely important for 
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analysis of real data. In these models, the random variables Yi,...,Yn are assumed 
independent and each Yi has a density function in the linear exponential family 

7r{y; 6,, 0) = exp[(f){yei - b{e,)} + c{y, 0)], (1) 

where b{-) and c(-, •) are known appropriate functions. We assume Y continuous and tt 
a probability density function with respect to Lebesgue measure and that the precision 
parameter = cx"^, cx^ is the so-called dispersion parameter, is the same for all obser- 
vations, although possibly unknown. We do not consider the discrete distributions in 
the form ([T]) such as Poisson, binomial and negative binomial. For two-parameter full 
exponential family distributions with canonical parameters and (j)6, the decomposition 
c{y,(j)) = 4>a{y) + di{y) + d2{4>) holds. The mean and variance of Yi are, respectively, 
E(Yi) = Hi = dh{0i)/d6i and Var(yi) = (p'^Vi, where V = djji/dO is the variance function. 
For gamma models, the dispersion parameter is the reciprocal of the index, whereas 
for normal and inverse Gaussian models, cr^ is the variance and VaiiYi) / E{YiY , respec- 
tively. The parameter = J V~^dfi = is a known one-to-one function of h. A linear 
exponential family is characterized by its variance function, which plays a key role in 
estimation. 

A GLM is defined by the family of distributions ([T]) and the systematic component 
g{fi) = 7] = XjS, where g{-) is a known one-to-one continuously twice-differentiable func- 
tion, X is a specified n x p model matrix of full rank p < n and (3 = . . . , (3p)'^ is a 
set of unknown linear parameters to be estimated. Let /3 be the MLE of (3. 

Residuals in GLMs were first discussed by Pregibon (1981), though ostensibly con- 
cerned with logistic regression models, Williams (1984, 1987) and Pierce and Schafer 
(1986). McCuUagh and Nelder (1989) provided a survey of GLMs with substantial atten- 
tion to definition of residuals. Pearson residuals are the most commonly used measures 
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of overall fit for GLMs and are defined by Ri = (Yi — f2i)/Vi , where fii and Vi are 
respectively the fitted mean and fitted variance function of Yi. In this paper we consider 
only Pearson residuals appropriate to our particular asymptotic aims when the sample 
size n ^ oo. Cordeiro (2004) obtained matrix formulae for the expectations, variances 
and covariances of these residuals and defined adjusted Pearson residuals having zero 
mean and unit variance to order n~^. Pearson residuals defined by Cordeiro (2004) are 
proportional to although we are considering here Ri as usual without the precision 
parameter 0. While Cordeiro's adjusted Pearson residuals do correct the residuals for 
equal mean and variance, the distribution of these residuals is not equal to the distribution 
of the true Pearson residuals to order n~^. 

Further, Cordeiro and Paula (1989) introduced the class of exponential family nonlin- 
ear models (EFNLMs) which extend the GLMs. Later, Wei (1998) gave a comprehensive 
introduction to these models. Recently, Simas and Cordeiro (2008) generalized Cordeiro's 
(2004) results by obtaining matrix formulae of the 0{n~^) expectations, variances and 
covariances of Pearson residuals in EFNLMs. 

In a general setup, the distribution of residuals usually differ from the distribution of 
the true residuals by terms of order n~^. Cox and Snell (1968) discussed a general defi- 
nition of residuals, applicable to a wide range of models, and obtained useful expressions 
to this order for their first two moments. Loynes (1969) derived, under some regularity 
conditions, and again to order n~^, the asymptotic expansion for the density function of 
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Cox and Snell's residuals, and then defined corrected residuals having the same distribu- 
tion as the random variables which they are effectively estimating. In all but the simplest 
situations, the use of the results by Cox and Snell and Loynes will require a considerable 
amount of tedious algebra. Our chief goal is to obtain an explicit formula for the density 
of Pearson residuals to order which holds for all continuous GLMs. 

In Section 2 we give a summary of key results from Loynes (1969) applied to Pearson 
residuals in GLMs. The density of Pearson residuals in these models corrected to order 
is presented in Section 3. We provide in Section 4 applications to some common 
models. In Section 5 we compare the corrected residuals with the adjusted residuals 
proposed by Cordeiro (2004). We present in Section 6 simulation studies to assess the 
adequacy of the approximations for a gamma model with log link. Some concluding 
remarks are given in Section 7. Finally, in the Appendix, we give a more rigorous proof 
of the general results discussed by Loynes (1969). 



2 Conditional moments of Pearson residuals 

The ith contribution for the score function from the observation Yi follows from (jl]) 

where w = V'^u!"^ is the weight function and from now on the dashes indicate derivatives 
with respect to r/. Let Ei = {Yi — fii) be the true Pearson residual corresponding 

- -1/2 

to the Pearson residual Ri = Vi (Yi — jli). Suppose we write the Pearson residual as 
Ri = Si + 5i. We can write the following conditional moments given Si = x io order 
(Loynes, 1969) 

Cov(/3r,/3s \ei = x) = —k!" 



,rs 



P 



6«(a;) = E0, - \ = x) = 5(4) - J2 ^'"U^'\^)^ (2) 

r=l 

where — k*'^ is the (s,r)th element of the inverse information matrix for (3, B{(3s) is 
the 0{n-^) bias of 4 and ul^\x) = E{ul^^ \ Si = x) is the conditioned score function. 
The mean and variance of the asymptotic distribution of 6i, given Si = x, are to order 

= E{6, I = X) = J2H^'\^)biHx) - li^^rfixX', (3) 



2 

r=l r,s 
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Var((5i \e, = x) = -Y^ H^\x)Hf{x)K'\ (4) 

T,S = 1 

where hI"^ = dei/d(3r, H^s = d^e,/df3rdf3s, H^\x) = E{H^^ \ e, = x) and H^^{x) = 
E{HrJ \ Si = x). We obtain by simple differentiation 



3 



and 

Conditioning on = x leads to Hr\x) = ei{x)xir and Hr^J{x) = hi{x)xirXis, where 

e.(x) = -v;-^/V:-^^rv/V.^ (5) 



and 



For canonical models {9 = r]), ([5| and ^ become 

e,{x) = -Vl^^ - ^x and hi{x) = ^(^/^^' - 2ViVl^^)x. 

Conditioning the score function Ur^'^ = (pV^ ^^'^w\^'^{Yi — fii)xir on Si = x, yields Ur^\x) = 
(j)w]^'^XirX, and then using (2) we find 

= B0,) + <Pwy\fK-'X^j,x, 

where = (f)~^{X'^WX)~^ , W = diagjwj} is the diagonal matrix of weights, is a 
p-vector with the sth element equal to one and all other elements equal to zero and ji 
is an n-vector with one in the zth position and zeros elsewhere. Defining M = {nisi} = 
{X'^WX)~^X'^ , we can easily verify that 

b^\x) = B0s)+wl^^msiX. 

Cordeiro and McCullagh (1991) showed that the bias of (3 is given by 

B0) = -{2<P)-\X'^WX)-^X^ Z^Fl, 

where F = diag{V~^ ^'^ fi"}, Z = {zy} = X{X'^WXy^X'^ , Z^ = diag{zjj} is a diagonal 
matrix with the diagonal elements of Z and 1 is an n-vector of ones. The asymptotic 
covariance matrix of the MLE fj of the linear predictor is simply Z. We obtain 

n n n 

r=l r=l r=l 

1 /2 

= ei{x){w/ ZiiX + B{fii)}, 

where B{fii) is the ith element of the 0{n~^) bias -8(17) = —{2(j))^^ZZdFl of fj. The 
bias expression depends on the model matrix, the variance function and the first two 
derivatives of the link function. Also, 

r,s=l ^ 



The conditional mean 6x^ from (|3l) is then a second-degree polynomial in x given by 



(7) 



where ei{x) and hi{x) are obtained from ([s]) and 

We now compute the conditional variance . From Ml) it follows 



/ ■\2 

Hence, (px is also a second-degree polynomial in x. 



3 The density of Pearson residuals 

A simple calculation from ([T]) gives the probability density function (pdf) of the true 
Pearson residual 



^iexp[(f){^/Vi9iX + Hi 9i - b{9i)} + c{>/ViX + /ij 



(9) 



where 6 = q{fi). Table 1 gives the densities of the true residuals for the normal, gamma 
and inverse Gaussian distributions, where r(-) is the gamma function. 



Table 1: Densities of the true residuals for some distributions 

Distribution 
Normal 
Gamma 

Inverse Gaussian 



Density in (1) 
i_ exp — -^^ — ^ 



Density of the true residual {fs{x)) 
1 



2<t2 

^g^exp(-0a;//i) 



/2Tva ""^ V 2(72/' 



exp I ) ,x G 
exp{— 0(1 + a;)}, a; > — 1 



27r(fjl/22..+l)3 



exp 



4)X 



2(mV2x'+1) / ' ^ > 



Throughout the following we assume that the standard regularity conditions of ma- 
ximum likelihood theory are satisfied. The pdf of the Pearson residual Ri in continuous 



GLMs to order n ^ follows from Loynes (1969). See, also, equation (21) in the Appendix. 
We have 



(10) 

where /^^(a;), 9x^ and (px^ come from (9), (7) and (sj), respectively. 

We now define corrected Pearson residuals for these models of the form R[ = Ri + 
Pi{Ri), where p(-) is a function of order C(n~^) constructed in order to produce the 
residual R'i with the same distribution of Ei to order n~^. Loynes (1969) showed (see, 
also, the proof given in the Appendix) that if 



p.(x) = + 



1 d{Ux)<P^f} 



2^(x) 



dx 
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then fR'.{x) = feX^) holds to order n~^, i.e., the corrected residuals i?- have the same 
distribution of the true residuals to this order of approximation. Combining ([s]) with (|9| 
gives 

1 d{U^{x)c0'} 



2/e.(x) dx (j) ' dx 

Using ([TT|, ([7]) and (12), the correction function turns out to be 



+ (12) 



Pi{x) 



ei{x){- 



1/2 



d 



+ 7r7(^i{x) < (j)\/Vi qifii) + —c{\/ViX + /i, 



dx 



(13) 



Direct substitution using (13) yields the corrected Pearson residuals i?^ for most models. 
The term Za in the above equation is just Var(77j). Although there are several terms 
in (13), the correction term is simple to be applied to any continuous model since we 
need only to calculate ej(x), hi{x) and ^c[^\fViX + /Xj, 0) from ([5|, (|6| and ([l ), the others 
terms being standard quantities in the theory of GLMs. More generally, the corrected 
residuals i?- depend on the model only through the matrix X, the precision parameter 
0, the function c(-, ■) and the variance and link functions with their first two derivatives. 

The density of the true residual for the inverse Gaussian model given in Table 1 
depends on the unknown mean /i. However, we can estimate this density using the 
general expression for the corrected MLE of /i, fi say, given by Cordeiro and McCullagh 
(1991, formula (4.4)). The resulting estimated density is identical to the true density 
except by terms of order less than and the results of Sections 3 and 4 could also be 
applied to this distribution. To prove this, let /i = /i + cjr? . Then, keeping only terms 
up to order n"^^ we have 



1/2 



1 + 



Also, 



and 



-3/2 



1 - 



2^2^^ 
3xc 



An"^ y/Jl{y/JIX + 1) 



exp 



— 0X 



2(/il/2a; + 1) 



exp 



— 0X 



2{JJlX + 1) /l J xc \ 



Then, 



Hence, 



exp 



-0x2 



2(/iV2a; + 1) 



exp 



-0x2 



2(^x + 1) 



exp 




exp 



27r (/ii/2a; + 1)3/2 ^[2{j2y^x + l 



— 0X 



ii^x + l)-3/2 (1 - ^) exp exp (^) 
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where Ci = 4^^f^x+i) '-2 — i^f^x+i)^ ■ From this equation it is clear that the 
estimated density and the true density of e are in agreement to order n~^. 

4 Some special models 



Formula (13) holds for all continuous GLMs including the models in common use: linear 



models, canonical models, normal models, gamma models and inverse Gaussian models. 



We now compute the correction pi{-) in (13) for some important GLMs and obtain the 



corrected residuals R[ = Ri + Pi{Ri). Table 2 and 3 give the quantities //', /i" and w for 
some useful link functions and q{p), V, w and -^c{\/Vx + /i, 0) for the normal, gamma 
and inverse Gaussian distributions, respectively. 

Table 2: Values of /i', fi" and w for some link functions. 



Link function 


Formula 






w 


Linear 


fi = r] 


1 







Log 


log{p) = 7] 




/i 




Reciprocal 


/i-^ = r] 




1p? 




Inverse of the square 


= r] 


-^72 







Table 3: Quantities V , w and ■j^c{\/Vx + p^cj)) for some models 



Model 




V 


w 


£c(v^a: + /x,0) 


Normal 




1 




-{x + 


Gamma 


-l//i 






(0-l)/(l+x) 


Inverse Gaussian 


-l/(2/i^) 











4.1 Linear models 

For linear models, pi = r]i, = 1, yu'/ = 0, Wi = Vf^ and B{T)i) = 0. Then, ej(x) 
-^r'^' - ^l^r'V;^'^^ and = Fr'^V/') + f l^r'^/'^'a; - ^Vr^V^^x. Hence, 

Pi{x) = Vi ^z,, x\\- -^—^ — + ^ + o ^ 



20 



^r' + vr'^'v}'^ X + \vrW}'^" x'^ |0v^g(/i.) + -^<Vv^x + p„ • 
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4.2 Canonical models 

For canonical models, rji = 9i, Wi — Vi, n'^ = Vi and n'l = ViVi^^\ Further, ei{x) = 
-V-'^ - and hi{x) = - 21/^1//'^. Hence, 

4.3 Normal models 

For normal models, Vi = 1, Wi = fj,f, c{x,(f)) = — l/2{a;^0 + log(27r/0)}, ^c{x + ii,4>) — 
— {x + ei{x) = —fi^ and hi{x) = —ii'l- We have 

// 12 

Pi{x) = B{r]i)fii + + X. 

The normal linear model for which n — 9 — rj, ei{x) — —1 and hi{x) — yields 

Pi{x) = Ziix/2, 

and the corrected residuals follow as 

We can verify that Var(i?-) = 1 + 0(n"^). A check of this expression can be obtained 
by considering the simplest case of independent and identically distributed observations. 
We have Z — 1 1^, Zu — and then 

which is identical to the result given in the example discussed by Loynes (1969). 



4.4 Gamma models 



For gamma models, Vi = fif, Wi = p^ "^pf, c{x,(p) = {(f) — l)log(a;) + 01og(0) — logr(0) 
and 4^c{px + p^cj)) = (0 — 1)/(1 + x). We have ei{x) = —p^^p[ — p^^p'^x and hi{x) ~ 
-p-^^p!l + 2ixfp:i - p^pll X + 2ixfij!l X. Then, 
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4.5 Inverse Gaussian models 

For inverse Gaussian models, Vi = fj,^, Wi = fj.~^fj.f, c(x,(/)) = (l/2)log{0/(27r2;'^)}— 0/(2a;) 
and £c(/i3/2x + /i, 0) = -^p^^ + 2(//C+m)^ " Farther, ei(x) = -/i^^^Vi - i/^»~V'ia; 
and = —fij^^'^fi" + 3/ij ^^^/if + x — ^fi^^fi'-x. Then, 

Pi(x) = /ij /ij + —2; lB(?7i) H — h ^, o + -TT — a; H 3- a; H ^ x 



/^i ^^ii f , /2 I 0,1/2, 9/^i/^f^2\ J '^'■^'^ 



5 Expansion for Cordeiro's adjusted residual 

We now obtain the density function of the adjusted Pearson residuals proposed by 
Cordeiro (2004). He gave simple expressions to order for the mean and variance 
of the Pearson residual Ri in GLMs, namely E{Ri) = rrii/n + 0{n~'^) and Var(i?j) = 
o"^ + Vi/n + (9(n~^), where 

— = -^7.(/-^)^^ and '^ = ^j^(QHJ-T)z, 
n 2 n 2 

I is the identity matrix of order n, H = W^^"^ X [X'^W X)^^ X'^W^^'^ is the projection ma- 
trix, J, Q and T are diagonal matrices given by J = diagjl^j^^^^yu'/}, Q = diag{V^~^^'^vl^^}, 
T = diag{20Wj + WiV^^"^^ + V'^V^^^ fi"}, z = {zu, . . . , Znn)'^ is an ra-vector with the diago- 
nal elements of Z = X {X'^W X)~^ X'^ , and 7^ was defined in Section 2. Cordeiro's (2004) 
adjusted residuals are 

' " (cr2 + {;,/n)V2- ^^^^ 
Expanding ((x^ + ^)~^/^ as cr~-'^(l — + ...) yields to order n^^ 

R* = cr-Uil-J^]R,-'^' 



2na^ I n 



Since = mj + (9p(n ""^Z^) and = fj + Op{n ^/^), we can write R* equivalently to order 



n ^ as 



R;=a-'\R.-n-^m, + '^]\, (15) 



which implies trivially that E{R*) = + 0{n-^/'^) and Var(i?*) = 1 + C(n-3/2). rpj^^^^ 



the adjusted residuals (14) have zero mean and unit variance to order . 

Let Si = n~^(mj + ^^)}. Since Ri = OJl), the cumulative distribution function 
(cdf) of Si, FsXx) say, can be obtained from ( |l5| ) to order following the approach 
developed by Cordeiro and Ferrari (1998, Section 2) 



FsXx) = FnAx) + l{rn, + ^) fM- (16) 
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Differentiation of (16) with respect to x, and replacing /rX^) by its asymptotic expansion 
in (10), yields the density of Si to the same order 



+ 



1 

n 



d 

dx 



rrii + 



ViX 



X 



2^2/ dx 



(17) 



The density function of R* is fR*{x) = afs,{<yx), where fs^{(yx) comes from (17) with ax 
replacing x. The sum of the second and third terms in ( 17) are expressed as ■£:{pi{x)fei (x)} 

Since ■mi/n,Vi/n,9x'' and ^x"* are all quantities of order 0{n~^), the terms on the right 
hand side of (17), except feXx), are of this order and then the densities fn^ix) and fejx) 



differ by terms of order 0{n~^). However, we showed in Section 3, that the densities 
fn', (x) and . (x) are equal to this order. Thus, the distribution of the corrected residuals 
R'^, even in small samples, is closer to the distribution of the true Pearson residuals than 
the distribution of the adjusted residuals R*. 

A simple expansion for the density fw^x) of the adjusted residuals R* to order 
for the normal model with any link function is given by 



where the constant terms 



ao 



+ 



2na^ 



e 2 



(l + ao — aix — 02^^) , 



ai 



an 



afi'lzi; 



a 



and 02 



Vi 



2na^ 



that depend on the model are all of order 0{n ^). 



3/ifz 



6 Simulation results 

We present some simulation results for studying the finite-sample distributions of the 
Pearson Ri, corrected R'^, adjusted R* and the true Ei residual. We use a gamma model 
with log link 

logyU = Po+ PlXl + P2X2, 

where the true values of the parameters were taken as Pq = 1/2, /?i = 1, /?2 = —1 and 
= 4. The explanatory variables Xi and X2 were generated from the uniform U(0, 1) 
distribution for n = 20 and their values were held constant throughout the simulations. 
The number of Monte Carlo replications was set at 10, 000 and all simulations were 
performed using the statistical software R. 

In each of the 10, 000 replications, we fitted the model and computed the MLE /3 
and fitted mean fi, the Pearson residuals Ri, the corrected function p(-) and the cor- 
rected residuals R[. Further, we calculated their expected values and variances from the 
expressions given by Cordeiro (2004) to obtain the adjusted residuals R*. Finally, we 
calculated the true residuals Ei. Tables 4 and 5 give the sample means, variances, skew- 
ness and kurtosis of the residuals Ri, R[, R* and Ei, respectively, out of 10,000 values. 
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The corrected residuals R[ should agree with the true Pearson residuals rather than to 
the normal distribution. A good agreement with the normal distribution happens when 
these figures are, on average, close to 0,1,0 and 3, respectively. 

The figures in Tables 4 and 5 show that the distribution of all residuals for the gamma 
model are positively skewed. All four cumulants of the corrected Pearson residuals El^ 
are generally closer to the corresponding cumulants of the true residuals Si than those 
of the other residuals. The adjusted residuals R* have cumulants much closer to the 
cumulants of a standard normal distribution as claimed by Cordeiro (2004). Further, 
the distribution of the corrected residuals is generally closer to the distribution of the 
true residuals than the distribution of the Pearson residuals. In short, the correction p(.) 
appears to be effective even when the sample size is small. 

In Table 6 we give the values of the Kolmogorov-Smirnov (K-S) and Anderson-Darling 
(A-D) (see, for instance, Anderson and Darling, 1952; Thode, 2002, Section 5.1.4) dis- 
tances between the empirical distribution of each set of the 10, 000 uncorrected Ri and 
corrected E!^ residuals for i = 1, . . . , 20, and the estimated distribution of the true resi- 
duals. The estimated distribution here is the shifted gamma distribution with dispersion 
parameter </> taken to be the sample average of the estimated dispersion parameters at 
each step of the Monte Carlo simulation. In Table 7, we follow the same procedure for 
Table 6, but we now examine if the uncorrected Ri and corrected R[ residuals follow the 
empirical distribution of the true residual Si. We then calculated both K-S and A-D dis- 
tances between the empirical distributions of both (uncorrected and corrected) residuals 
and the empirical distribution of the true residuals Si. 

We see from Tables 6 and 7 that the distribution of the corrected residuals is closer to 
the distribution of the true residuals than the distribution of the uncorrected residuals. 
Furthermore, the distances for the corrected residuals are substantially smaller than the 
distances for the uncorrected ones. These facts show that, when the model is well- 
specified, our correction works very well for the set of the corrected residuals. 

We conclude the study providing an application of the corrected residuals to assess the 
adequacy of the above gamma model. We could expect that under a well-specified model, 
the distribution of the corrected residuals will follow approximately the distribution of 
the true residuals. However, even though it is common to compare the distribution of the 
Pearson residuals with the normal distribution, it is not clear that this approximation 
should be good in small samples. Therefore, we compare the empirical distribution of 
the corrected residuals with the distribution of the true residuals and the distribution 
of the uncorrected residuals with the normal distribution. For doing this, we use a QQ- 
Plot which displays a quantile-quantile plot of the sample quantiles of the corrected and 
uncorrected residuals versus theoretical quantiles from the estimated distribution of the 
true residuals and the normal distribution with mean zero and variance respectively. 
If the distribution of the corrected residuals is well approximated by the distribution of 
the true residuals, the plot will be close to linear. Therefore, we expect that a QQPlot of 
the Studentized corrected residuals versus the estimated distribution of the true residu- 
als should be closer to the diagonal line than that QQPlot of the uncorrected residuals 
against the normal A^(0, (j)^^) distribution. Moreover, we also consider the QQPlot of 
the adjusted residuals suggested by Cordeiro (2004) against the theoretical quantiles of 
a standard normal distribution. 

Figure 1 gives two QQPlots, one for the vector of the 10, 000 ordered uncorrected 
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Table 4: Mean and variance of uncorrected, corrected, adjusted and true residuals. 
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residuals and other for the vector of the 10, 000 ordered corrected residuals. These fi- 
gures show that even for a well-specified model, the plot for the uncorrected residuals 
is very distant from the diagonal line when compared with the plot for the corrected 
residuals. The adjusted residuals given in Figure 2 provides an improvement in regard 
to the uncorrected residuals, but the plot is also distant from the diagonal line when 
compared to the corrected residuals. Therefore, the corrected residuals have a good 
behavior that leads to the right conclusion, i.e., that the model is well-specified. We thus 
recommend the corrected residuals to build up QQPlots. 
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Table 5: Skewness and kurtosis of uncorrected, corrected, adjusted and true residuals. 
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7 Conclusion 

Using the results given in Loynes (1969), we calculate the O (n^^) distribution of the 
Pearson residuals in GLMs (see, for instance, McCuUagh and Nelder, 1989). It is im- 
portant to mention that the distribution of residuals in regression models are typically 
unknown, and therefore all inference regarding these residuals are done by asymptotic 
assumptions which may not hold in small or moderate sample sizes. Then we can use this 
knowledge to define corrected Pearson residuals in these models in such a way that the 
corrected residuals will have, to order O (n~^), the same distribution of the true Pearson 
residuals, which is known. The corrected residuals have practical applicability for all 
continuous GLMs. We simulate a gamma model with log link to conclude the superiority 
of the corrected Pearson residuals B'^ over the uncorrected residuals Bi and also over 
the adjusted residuals suggested by Cordeiro (2004) with regard to the approximation 
to the reference distribution, which for the corrected and uncorrected residuals was the 
distribution of the true residuals and for the adjusted residuals was the standard normal 
distribution. The paper is concluded with an application of the corrected residuals to 
assess the adequacy of the model. 
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Table 6: One-sample K-S and A-D statistics for uncorrected and corrected residuals. 
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Appendix 

Suppose we write the residual R in terms of the true residual e as R — e + 5, where e and 
5 are absolutely continuous random variables with respect to Lebesgue measure and S is 
of order Op{n^^). Our goal is to define a corrected residual R' having the same density 
of e to order n~^. Initially, we have 



^i^E{5'' I e). 

s=0 



E{e''^) = E{e'''E{e''^ \ e)} and Q^E{e''^ \ e) 
Expanding E[e^^^ | s) in a Taylor series around s = gives 

E{e''^ \e) = l + {%s)E{5 \ e) + ^^(5^ | + • • • . 
Let 9^ = E{5 \ £ ^ x) and 0^ = Var(5 | x). Thus, 

E{e^seE^^is5 I ^ g... |i ^ (-^^^^ ^ M!(^2 + + . . . I /^(^)d^, (18) 

where /£(■) is the density function of e. By using formulae (25) and (26) from Cox and 
Snell (1968) with £ = 0, it is possible to conclude that E{5) and Var(5) (and thus -E(5^)) 
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Table 7: Two-sample K-S and A-D statistics for uncorrected and corrected residuals. 
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are of order OirT^^ and, in the same way, that the higher moments of b are of order 
o{n~^\ In a similar manner, we can show that \ e = x) and Var(5 \ e = x) are also of 
order 0{n~^\ and that the higher-order conditional moments are of order o{rr^\ Then, 
we can rewrite equation (18) as 



E{e'''E{e''^ \ e)} 



1 + its)e^ + ^0^. 1^ feix)dx + o(n-i). (19) 



Note that we can express the integral on the right side of ( 19 ) as a sum of three integrals. 
Then, integration by parts, one time for the integral containing 6^ on the integrand and 
two times for the integral containing 0^ on the integrand, yields the following formula 



^'^^^ dx ^2 dx^ 



dx + o{n-^). (20) 



The uniqueness theorem for characteristic functions yields the density of R to order n ^ 



n , ^ r , d{fJx)9x\ I d'^{fe{x)(l)A , 1^ 

dx z dx'^ 



(21) 



Equation (21) is identical to formula (5) in Loynes (1969). 
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Further, we now define corrected residuals of tlie form R' = R + p{R), where p(-) is a 
function of order 0{n~^) used to recover the distribution of e. We may proceed as above, 
noting that E{p{R) \ R = x} = p{x), to obtain the density of R' to order 

fR'ix) = fnix) - -^{p{x)fR{x)}. 

Since the quantities p{x), 9^ and 0^ are all of order 0{n~^), we have that ^{p{x)fji{x)} = 
■^{p{x) fe{x)} to this order. Therefore, the densities of R and e will be the same to order 
if 

Integration gives 

Equation (22) is identical to equation (6) given by Loynes (1969) and it is clear from the 
proof that the support of e does not need to be the entire line and we can have proper 
intervals as support. We should note that the assumptions needed can be made weaker 
if we require that an expansion of the Taylor polynomial of order two with a remainder 
term (for instance, Lagrange remainder) can be done instead of the complete series. 

We could also prove Loynes' (1969) results by using the equivalence of (3c) and (4c), 
together with (5) and (6) of Cox and Reid (1987) and appropriate regularity conditions. 
The idea to this approach is as follows: consider in equation (3c) of Cox and Reid 
(1987) Xq = e, Xi = n^l'^b and X^ = 0. This means that we are writing 1^ as y„ = 
e + 5 + Op{n~^/'^), where e and 5 are of orders Cp(l) and Op{n~^), respectively. Then, 
from (4c), (5) and (6) of Cox and Reid (1987), we can write de cdf of F„ as 

Gn{y) = Fo{y) - E{6 \ e = y)Uy) + ~{E{5^ I e = x)fo{y)} + 0{n-'/'), 

where -Fo(") and /o(-) are the cdf and pdf of e, respectively. The expression above implies 
equation (|2T|. We can also obtain the expansion for R + p{R) from the equivalence of 
(3c) and (4c) of Cox and Reid (1987) by setting Xq = i?,Xi = and X2 = p{R). The 
rest of the proof is identical to the one given before. Note also, that for this proof e does 
not need to have a support in the entire line since this is not an assumption in the usual 
regularity conditions. 
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Figure 1: QQPlots for the Pearson and corrected residuals 
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QQPIot of the adjusted residuals 
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Figure 2: QQPIot for the adjusted residuals 
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